361 research outputs found

    Enhancing CIDOC-CRM Models for GeoSPARQL Processing with MapReduce

    Get PDF
    Spatial and temporal dimensions are two important characteristics of archaeological data and cultural heritage in general. The ability to perform some sort of reasoning on them is crucial during the analysis and interpretation process performed by domain experts. Many models have been defined in literature in order to properly describe such data and support the following interpretation process; among them, CIDOC CRM is a formal ontology specifically developed to represent cultural heritage information and many extensions have been proposed in recent years in order to enrich such model. In particular, CRMgeo tries to bring the gap between the cultural heritage domain and the geo-spatial domain, by providing a link towards GeoSPARQL and by defining the necessary constructs for the representation of spatial data types and relations. Unfortunately, the current support to the process of spatial functions through SPARQL query engine is still limited and many performance problems remain. The aim of this paper is twofold: (i) to evaluate the applicability of CRMgeo in representing spatial characteristics and relations of archaeological objects, and (2) to propose a MapReduce procedure able to efficiently derive spatial relations between objects, in order to automatically enhance an RDF model with them and avoid the performance issues derived from the use of GeoSPARQL query engine

    A framework for integrating multi-accuracy spatial data in geographical applications

    Get PDF
    In recent years the integration of spatial data coming from different sources has become a crucial issue for many geographical applications, especially in the process of building and maintaining a Spatial Data Infrastructure (SDI). In such context new methodologies are necessary in order to acquire and update spatial datasets by collecting new measurements from different sources. The traditionalapproach implemented in GIS systems for updating spatial data does not usually consider the accuracy of these data, but just replaces the old geometries with the new ones. The application of such approach in the case of an SDI, where continuous and incremental updates occur, will lead very soon to an inconsistent spatial dataset withrespect to spatial relations and relative distances among objects. This paper addresses such problem and proposes a framework for representing multi-accuracy spatial databases, based on a statistical representation of the objects geometry, together with a method for the incremental and consistent update of the objects, that applies acustomized version of the Kalman filter.Moreover, the framework considers also the spatial relations among objects, since they represent a particular kind of observation that could be derived from geometries or be observed independently in the real world. Spatial relations among objects need also to be compared in spatial dataintegration and we show that they are necessary in order to obtain a correct result in merging objects geometries

    A Balanced Solution for the Partition-based Spatial Merge join in MapReduce

    Get PDF
    Several MapReduce frameworks have been developed in recent years in order to cope with the need to process an increasing amount of data. Moreover, some extensions of them have been proposed to deal with particular kind of information, like the spatial one. In this paper we will refer to SpatialHadoop, a spatial extension of Apache Hadoop which provides a rich set of spatial data types and operations. In the geo-spatial domain, spatial join is considered a fundamental operation for performing data analysis. However, the join operation is generally classified as a critical task to be performed in MapReduce, since it requires to process two datasets at time. Several different solutions have been proposed in literature for efficiently performing a spatial join which may or may not require the presence of a spatial index computed on both datasets or only one of them. As already discussed in literature, the efficiency of such operation depends on the ability to both prune unnecessary data as soon as possible and to provide a balanced amount of work to be done by each parallelly executed task. In this paper,we take a step forward in this direction by proposing an evolution of the Partition-based Spatial Merge Join algorithm which tries to completely exploit the benefit of the parallelism induced by the MapReduce framework. In particular, we concentrate on the partition phase which has to produce filtered balanced and meaningful subdivisions of the original datasets

    A Spatio-Temporal Framework for Managing Archeological Data

    Get PDF
    Space and time are two important characteristics of data in many domains. This is particularly true in the archaeological context where informa- tion concerning the discovery location of objects allows one to derive important relations between findings of a specific survey or even of different surveys, and time aspects extend from the excavation time, to the dating of archaeological objects. In recent years, several attempts have been performed to develop a spatio-temporal information system tailored for archaeological data. The first aim of this paper is to propose a model, called Star, for repre- senting spatio-temporal data in archaeology. In particular, since in this domain dates are often subjective, estimated and imprecise, Star has to incorporate such vague representation by using fuzzy dates and fuzzy relationships among them. Moreover, besides to the topological relations, another kind of spatial relations is particularly useful in archeology: the stratigraphic ones. There- fore, this paper defines a set of rules for deriving temporal knowledge from the topological and stratigraphic relations existing between two findings. Finally, considering the process through which objects are usually manually dated by archeologists, some existing automatic reasoning techniques may be success- fully applied to guide such process. For this purpose, the last contribution regards the translation of archaeological temporal data into a Fuzzy Temporal Constraint Network for checking the overall data consistency and reducing the vagueness of some dates based on their relationships with other ones

    In-memory caching for multi-query optimization of data-intensive scalable computing workloads

    Get PDF
    In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work. Instead of optimizing jobs independently, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub) expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Experiments on a prototype implementation of our system show significant benefits of worksharing for TPC-DS workloads

    An Interoperable Spatio-Temporal Model for Archaeological Data Based on ISO Standard 19100

    Get PDF
    Archaeological data are characterized by both spatial and temporal dimensions that are often related to each other and are of particular interest during the interpretation process. For this reason, several attempts have been performed in recent years in order to develop a GIS tailored for archaeological data. However, despite the increasing use of information technologies in the archaeological domain, the actual situation is that any agency or research group independently develops its own local database and management application which is isolated from the others. Conversely, the sharing of information and the cooperation between different archaeological agencies or research groups can be particularly useful in order to support the interpretation process by using data discovered in similar situations w.r.t. spatio-temporal or thematic aspects. In the geographical domain, the INSPIRE initiative of European Union tries to support the development of a Spatial Data Infrastructure (SDI) through which several organizations, like public bodies or private companies, with overlapping goals can share data, resources, tools and competencies in an effective way. The aim of this paper is to lay the basis for the development of an Archaeological SDI starting from the experience acquired during the collaboration among several Italian organizations. In particular, the paper proposes a spatio-temporal conceptual model for archaeological data based on the ISO Standards of the 19100 family and promotes the use of the GeoUML methodology in order to put into practice such interoperability. The GeoUML methodology and tools have been enhanced in order to suite the archaeological domain and to automatically produce several useful documents, configuration files and codebase starting from the conceptual specification. The applicability of the spatio-temporal conceptual model and the usefulness of the produced tools have been tested in three different Italian contexts: Rome, Verona and Isola della Scala

    Skewness-Based Partitioning in SpatialHadoop

    Get PDF
    In recent years, several extensions of the Hadoop system have been proposed for dealing with spatial data. SpatialHadoop belongs to this group of projects and includes some MapReduce implementations of spatial operators, like range queries and spatial join. the MapReduce paradigm is based on the fundamental principle that a task can be parallelized by partitioning data into chunks and performing the same operation on them, (map phase), eventually combining the partial results at the end (reduce phase). Thus, the applied partitioning technique can tremendously affect the performance of a parallel execution, since it is the key point for obtaining balanced map tasks and exploiting the parallelism as much as possible. When uniformly distributed datasets are considered, this goal can be easily obtained by using a regular grid covering the whole reference space for partitioning the geometries of the input dataset; conversely, with skewed distributed datasets, this might not be the right choice and other techniques have to be applied. for instance, SpatialHadoop can produce a global index also by means of a Quadtree-based grid or an Rtree-based grid, which in turn are more expensive index structures to build. This paper proposes a technique based on both a box counting function and a heuristic, rooted on theoretical properties and experimental observations, for detecting the degree of skewness of an input spatial dataset and then deciding which partitioning technique to apply in order to improve as much as possible the performance of subsequent operations. Experiments on both synthetic and real datasets are presented to confirm the effectiveness of the proposed approach

    Tracking Data Provenance of Archaeological Temporal Information in Presence of Uncertainty

    Get PDF
    The interpretation process is one of the main tasks performed by archaeologists who, starting from ground data about evidences and findings, incrementally derive knowledge about ancient objects or events. Very often more than one archaeologist contributes in different time instants to discover details about the same finding and thus, it is important to keep track of history and provenance of the overall knowledge discovery process. To this aim, we propose a model and a set of derivation rules for tracking and refining data provenance during the archaeological interpretation process. In particular, among all the possible interpretation activities, we concentrate on the one concerning the dating that archaeologists perform to assign one or more time intervals to a finding to define its lifespan on the temporal axis. In this context, we propose a framework to represent and derive updated provenance data about temporal information after the mentioned derivation process. Archaeological data, and in particular their temporal dimension, are typically vague, since many different interpretations can coexist, thus, we will use Fuzzy Logic to assign a degree of confidence to values and Fuzzy Temporal Constraint Networks to model relationships between dating of different findings represented as a graph-based dataset. The derivation rules used to infer more precise temporal intervals are enriched to manage also provenance information and their following updates after a derivation step. A MapReduce version of the path consistency algorithm is also proposed to improve the efficiency of the refining process on big graph-based datasets
    • …
    corecore